Understanding Big Data by IBM Paul Zikopoulos
Author:IBM Paul Zikopoulos
Language: eng
Format: epub
Publisher: McGraw-Hill Education
Published: 2012-09-14T16:00:00+00:00
The Basics of MapReduce
MapReduce is the heart of Hadoop. It is this programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster. The MapReduce concept is fairly simple to understand for those who are familiar with clustered scale-out data processing solutions. For people new to this topic, it can be somewhat difficult to grasp, because it’s not typically something people have been exposed to previously. If you’re new to Hadoop’s MapReduce jobs, don’t worry: we’re going to describe it in a way that gets you up to speed quickly.
The term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.
Let’s look at a simple example. Assume you have five files, and each file contains two columns (a key and a value in Hadoop terms) that represent a city and the corresponding temperature recorded in that city for the various measurement days. Of course we’ve made this example very simple so it’s easy to follow. You can imagine that a real application won’t be quite so simple, as it’s likely to contain millions or even billions of rows, and they might not be neatly formatted rows at all; in fact, no matter how big or small the amount of data you need to analyze, the key principles we’re covering here remain the same. Either way, in this example, city is the key and temperature is the value.
The following snippet shows a sample of the data from one of our test files (incidentally, in case the temperatures have you reaching for a hat and gloves, they are in Celsius):
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7809)
Grails in Action by Glen Smith Peter Ledbrook(7719)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6829)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6828)
Running Windows Containers on AWS by Marcio Morales(6355)
Kotlin in Action by Dmitry Jemerov(5090)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(5064)
Combating Crime on the Dark Web by Nearchos Nearchou(4640)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4603)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4437)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4331)
The Age of Surveillance Capitalism by Shoshana Zuboff(3979)
Python for Security and Networking - Third Edition by José Manuel Ortega(3890)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3550)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3528)
Learn Wireshark by Lisa Bock(3520)
Mastering Python for Networking and Security by José Manuel Ortega(3376)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3355)
Blockchain Basics by Daniel Drescher(3324)
